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Abstract. We introduce the reactive synthesis competi¬ 
tion (SYNTCOMP), a long-term effort intended to stim¬ 
ulate and guide advances in the design and application 
of synthesis procedures for reactive systems. The first 
iteration of SYNTCOMP is based on the controller syn¬ 
thesis problem for finite-state systems and safety spec¬ 
ifications. We provide an overview of this problem and 
existing approaches to solve it, and report on the de¬ 
sign and results of the first SYNTCOMP. This includes 
the definition of the benchmark format, the collection of 
benchmarks, the rules of the competition, and the five 
synthesis tools that participated. We present and ana¬ 
lyze the results of the competition and draw conclusions 
on the state of the art. Finally, we give an outlook on 
future directions of SYNTCOMP. 
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1 Introduction 


Ever since its definition by Church j^, the automatic 
synthesis of reactive systems from formal specifications 
has been one of the major challenges of computer sci¬ 
ence, and an active field of research. A number of funda¬ 
mental approaches to solve the problem have been pro¬ 
posed (see e.g. [M 53 ^). Despite the obvious advan¬ 
tages of automatic synthesis over manual implementa¬ 
tion and the significant progress of research on theoret¬ 
ical aspects of synthesis, the impact of formal synthe¬ 
sis procedures in practice has been very limited. One 
reason for this limited impact is the scalability problem 
that is inherent to synthesis approaches. The reactive 


synthesis problem is in general 2EXPTIME-complete for 
LTL specifications 53 . A number of approaches have re¬ 
cently been invented to solve special cases of the problem 
more efficiently, either by restricting the specification 
language 12 , or by a smart exploration of the search 
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35,59 . While important progress on the 


space 

scalability problem has been made, an additional prob¬ 
lem is the lacking maturity and comparability of im¬ 
plementations, and a lack of incentive for the develop¬ 
ment of efficient implementations |27| . Solving different 
aspects of this problem is the main motivation of SYNT¬ 
COMP, as explained in the following (inspired by |48]). 


Synthesis tools are hard to compare. Research papers 
that introduce a new algorithm in many cases do include 
a comparison of its implementation against existing ones. 
However, the comparison of a large number of tools on 
a benchmark set of significant size can take weeks or 
months of computation time. This is often circumvented 
in research papers by comparing the new results to ex¬ 
isting experimental data (usually obtained under differ¬ 
ent experimental conditions), or by comparing against a 
small number of tools on a small benchmark set. In both 
cases, this limits the value of the experimental results. In 
contrast, SYNTCOMP provides reliable results for a sig¬ 
nificant number of synthesis tools on a large benchmark 
set, with consistent experimental conditions. 


It is hard to exchange benchmark sets. Related to the 
comparison of tools, we note that almost every existing 
tool uses its own input language, and benchmarks have 
to be translated from one format to another in order to 
compare different tools. This makes it hard to exchange 
benchmark sets, and adds another source of uncertainty 
when comparing tools. SYNTCOMP aims to solve these 
issues by defining a standard benchmark format, and by 





















2 


Swen Jacobs et al.: The First Reactive Synthesis Competition (SYNTCOMP 2014) 


collecting a benchmark library that is publicly available 
for the research community. 

Usability of synthesis tools. Implementations of many 
synthesis approaches do exist [^ [l4|[^ , but they cannot 
effectively be used as black-box solvers for applications. 
The definition of a standard language is a first step in 
this direction. In addition, the competition forces tool 
developers to produce implementations that are suffi¬ 
ciently robust to work on the complete benchmark li¬ 
brary of SYNTCOMP with a fixed configuration. Thus, 
SYNTCOMP promotes the simplicity of use that comes 
with push-button approaches that do not require any 
user intervention. 


Summing up, the goal of the reactive synthesis competi¬ 
tion (SYNTCOMP) is to foster research in scalable and 
user-friendly implementations of synthesis techniques. 


Related competitions. Competitions have been used 
to achieve these goals in many related fields, including 
automated reasoning [5 43 62 and automated verifica¬ 


Gp A difference of synthesis competitions to most 


tion 

of the competitions in automated reasoning or verifica¬ 
tion is that solutions to the synthesis problem can be 
ranked according to inherent quality criterions that go 
beyond mere correctness, such as reaction time or size 
of the solution. Thus, a synthesis competition also needs 
to measure the quality of solutions with respect to these 
additional metrics. 

In parallel to SYNTCOMP 2014, the syntax-guided 
synthesis competition (SyGuS-COMP) was held for the 
first time [^. The focus of SyGuS-COMP is on the syn¬ 
thesis of functional instead of reactive programs, and the 
specification is given as a first-order logic constraint on 
the function to be synthesized, along with a syntactic 
constraint that restricts how solutions can be built. The 
goals of SyGuS-COMP are similar to those of SYNT¬ 
COMP, but for a fundamentally different class of pro¬ 
grams and specifications. 


Timeline. The organization of the first SYNTCOMP 
began formally with a presentation and discussion of 
ideas at the second Workshop on Synthesis (SYNT) in 
July 2013. The organization team consisted of Roderick 
Bloem, Rudiger Ehlers and Swen Jacobs. The decision 
for the specification format was made and announced 
in August 2013, and a call for benchmarks, along with 
the rules of the competition, was published in November 
2013. In March 2014 we published our reference imple¬ 
mentation, and benchmarks were collected until the end 
of April 2014. Participants had to submit their tools until 
the end of May 2014, and the experiments for the compe¬ 
tition were executed in June and July 2014. The results 
were first presented at the 26th International Confer¬ 
ence on Computer Aided Verification (CAV) and the 3rd 
SYNT Workshop in July 2014. 

^ See also: the Hardware Model Checking Competition, http: 
//fmv. jku.at/hwmcc/. Accessed February 2016. 


Goals. The first competition had the following goals: 

— define a class of synthesis problems and a benchmark 
format that results in a low entry-barrier for inter¬ 
ested researchers to enter the competition 

— collect benchmarks in the SYNTCOMP format 

— encourage development of synthesis tools that sup¬ 
port the SYNTCOMP format 

— provide a lobby that connects tool developers with 
each other, and with possible users of synthesis tools 

SYNTCOMP 2014 was already a success before the ex¬ 
perimental evaluation began: within less than 10 months 
after the definition of the benchmark format, we col¬ 
lected 569 benchmark instances in 6 classes of bench¬ 
marks, and 5 synthesis tools from 5 different research 
groups were entered into the competition. For four of 
the tools, at least one of the developers was present at 
CAV and/or the SYNT workshop. 

Overview. The rest of this article describes the back¬ 
ground, design, participating solvers (called entrants), 
and results of SYNTCOMP 2014. We will introduce the 
synthesis problem for safety specifications, as well as dif¬ 
ferent approaches for solving it, in Section We define 
the SYNTCOMP format in Section and describe the 
benchmark set for SYNTCOMP 2014 in Section H] Sec¬ 
tion defines the rules of the competition. In Section 
we give an overview of the entrants of SYNTCOMP 
2014, followed by some notes on the execution of the 
competition in Section In Section we present and 
analyze the experimental results of the competition. 

Note that Sections and as well as parts of Sec¬ 
tion]^ are based on the descriptions that the respective 
benchmark and tool authors had to supply in order to 
participate. The setup of the competition framework as 
described in Section was taken care of by Timotheus 
Hell. The remainder of this article is original work of the 
SYNTCOMP organizers. 

This article is based on the first description of the 
SYNTCOMP format and a preliminary version of 
the SYNTCOMP 2014 report (^. 

2 Problem Description and Synthesis 
Approaches 


Informally, the reactive synthesis problem consists of 
finding a system C that satisfies a given specification if 
in an adversarial environment. In general, systems may 
be infinite-state (programs) or finite-state (circuits), and 
specifications can come in different forms, for example as 
temporal logic formulas or as monitor circuits. 

For the first SYNTCOMP, we aimed for a low entry- 
barrier for participants, and to keep the competition 
manageable in terms of tasks like the definition of in¬ 
put and output format, and the verification of results. 
To this end, we only consider the synthesis of finite-state 
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systems from pure safety specifications modeled as moni¬ 
tor circuits. The monitor circuit reads two kinds of input 
signals: uncontrollable inputs from the environment, and 
controllable inputs from the system to be synthesized. It 
raises a special output BAD if the safety property Lp has 
been violated by the sequence of input signal valuations 
it has read thus far. 

Then, the realizability problem is to determine if there 
exists a circuit C that reads valuations of the uncontrol¬ 
lable inputs and provides valuations of the controllable 
inputs such that BAD is not raised on any possible exe¬ 
cution. The synthesis problem is to provide such a C if it 
exists. As a quality criterion, we consider the size of the 
produced implementation, which not only correlates to 
the cost of implementing a circuit in hardware, but often 
also leads to implementations which have other desirable 
properties, like short reaction time. 


2.1 Synthesis as a Safety Game 


The traditional approach to reactive synthesis is to view 
the problem as a game between two players , 54 63 


the environment player decides uncontrollable inputs, 
and the system player decides controllable inputs of the 
monitor circuit. States of the game are valuations of the 
latches in the monitor circuit. A state is safe if BAD is 
not raised. The goal of the system player is to visit only 
safe states, regardless of the environment behavior. 


Game-based Synthesis. In a first step, a so-called win¬ 
ning region for the system player is computed. The win¬ 
ning region W is the set of all states from which the sys¬ 
tem player can enforce the specification, i.e., from which 
it can guarantee that the environment cannot force the 
game into an unsafe state. 

In a second step, a winning strategy is derived from 
the winning region. For every state and every valuation 
of the uncontrollable inputs, the winning strategy defines 
a set of possible valuations of the controllable inputs that 
can ensure that the winning region is not left. 

The last step is to implement this strategy in a cir¬ 
cuit, where a concrete choice for the controllable inputs 
has to be made for every state and valuation of uncon¬ 
trollable inputs. 

All of the tools in SYNTCOMP 2014 implement such 
a game-based synthesis approach, in one form or an¬ 
other. 


Symbolic encoding. To achieve acceptable scalabil¬ 
ity, it is important to implement synthesis algorithms 
symbolically, i.e., by manipulating formulas instead of 
enumerating states [^. In synthesis, symbolic algorithms 
are usually implemented with Binary Decision Diagrams 


(BDDs) (^[^. Most of the tools in SYNTCOMP 2014 


use BDD-based approaches with different optimizations 
to achieve good performance in synthesis time and cir¬ 
cuit size. 


However, BDDs also have scalability issues, in par¬ 
ticular the growing size of the data structure itself. Al¬ 
ternatively, the problem can be encoded into a sequence 
of propositional satisfiability (SAT), quantified Boolean 
formulas (QBF), or satisfiability modulo theories (SMT) 
problems. The enormous performance improvements in 
decision procedures for satisfiability over the last decades 
encourage such approaches. 


In the following, we give a mostly informal descrip¬ 
tion of the three synthesis techniques used by the tools 
that entered SYNTCOMP 2014: BDD-based game solv- 



2.2 Preliminaries: Circuits and Games 

Let B = {0,1}. If X denotes a finite set of Boolean vari¬ 
ables, then any v G B^ is called a valuation of X. Sets 
of valuations of X are represented by quantified Boolean 
formulas on X, which are made of propositional logic 
and first-order quantification on A. A formula f with 
free variables X will be written as /(A), and for the same 
formula under a given valuation u of A we write f{v). If 
the free variables are A U Y, we also write /(A, Y). For 
a set of variables A = {cci,..., Xn}, we write 3A instead 
of 3xi3x2 ■ ■ ■ 3a;„, and similarly for universal quantifica¬ 
tion. For a set of variables A = {xi,... ,Xn}, we use 
A' to denote {x '^,..., a;),}, a set of primed copies of the 
variables in A, usually representing the variables after a 
step of the transition relation. 

Then, the synthesis problem is given as a (sequential) 
monitor circuit M over the sets of variables L, A„, A^, 
where 

— L are state variables for the latches in the monitor 
circuit, 

— A„ are uncontrollable input variables, 

— Ac are controllable input variables, and 

— BAD G L is a special variable for the unsafe states, 
i.e., a state is unsafe iff BAD = 

We assume that the system has a unique initial state, in 
which all latches in L including BAD are initialized to 0. 

A solution of the synthesis problem is a sequential 
circuit C with inputs L U A„ and outputs Ac, such that 
the composition of C and M is safe, i.e., states with 
BAD = 1 are never reached, for any sequence of (uncon¬ 
trollable) inputs A„ and starting from the unique initial 
state. The synthesis problem is depicted in Figure 

Note that a circuit defines a (Mealy-type) finite-state 
machine in the standard way. With the additional dis¬ 
tinction between controllable and uncontrollable inputs 
and the interpretation of BAD as the set of unsafe states, 

^ For simplicity, we assume that BAD is a latch. If BAD is not 
a latch in the given monitor circuit, then it can be described as 
a formula f{L, Xu, Xc). In this case we can obtain a problem in 
the described form by extending the circuit with a new latch that 
takes f{L, Xu, Xc) as an input and provides output BAD. 
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Fig. 1. Synthesis problem with monitor circuit M and (unknown) 
system circuit C 


it defines a safety game: the set of states is ( val¬ 
uations of latches L), with initial state 0^. The tran¬ 
sition relation of the monitor circuit can be translated 
into a formula T(L, Xc^ L') that relates valuations of 
L,Xu,Xc to the valuation of (next- state variables) L'. 
In every turn of the game, first the environment player 
chooses a valuation of and then the system player 
chooses a valuation of Xc- The successor state is com¬ 
puted according to T{L, Xu, X^-, L'). A strategy of the 
system player is a function that maps the sequence of 
valuations of L and Xu seen thus far to a set of possible 
valuations for Xc- It is deterministic if it always maps 
to a unique valuation. A strategy is winning for the sys¬ 
tem player if it avoids entering the unsafe states regard¬ 
less of the actions of the environment. Two-player safety 
games are determined, i.e., for every such game either 
the environment player or the system player has a win¬ 
ning strategy. A memoryless strategy only depends on 
the current values of L and For safety games, there 
exists a winning strategy iff there exists a memoryless 
winning strategy. A deterministic memoryless winning 
strategy can be represented as a circuit, and thus pro¬ 
vides a solution to the synthesis problem. 

2.3 BDD-based Game Solving 


The resulting set of latch valuations represents the states 
from which the environment can force the game into the 
unsafe states. Since two-player safety games are deter¬ 
mined, the complement of this set is the winning region 
W{L) for the system player (see, e.g. (^). 

That is, if during our fixpoint computation we no¬ 
tice that the environment can force the game from the 
initial state to the unsafe states, then we can stop — 
the specification is unrealizable. Otherwise, the initial 
state will be contained in the winning region W{L), and 
W(L) represents a non-deterministic strategy A for the 
system player, which can be described as a function A 
that maps a valuation s G B^ of the latches and a valu¬ 
ation tT„ G B^“ of the uncontrollable variables to a set 
of possible valuations for the controllable variables: 

A(s,cr„) = {ctc e B^" I VL'. T(s,(7u,(7c,L') -g W(L')}. 


To solve the synthesis problem, in principle any deter- 
minization of A can be chosen to obtain a functional 
strategy for the system player. 

In order to compute the winning region efficiently 
and to find a strategy that can be represented as a small 
circuit, a number of optimizations can be used. We intro¬ 
duce some of the common optimizations in the following, 
in order to be able to compare and distinguish the par¬ 
ticipants that use a BDD-based approach. 


Partitioned Transition Relation and Direct Sub¬ 
stitution. To be efficient, the explicit construction of 
the BDD for the transition relation should be avoided. 
This can be achieved by partitioning the transition re¬ 
lation into a set of simpler relations, inspired by similar 
approaches for the model checking problem [2Tp4|[^ . A 
common approach is to split T{L, Xu, Xc, L') into a set 
of (functional) relations {fi{L, Xu, Xc)}i^l, where each 
fi represents the next-state value of latch 1. 

Then, the uncontrollable predecessor can be com¬ 
puted as 


UPRE(,5(L')) = BXu VXc. S{L')[l' g- ML, Xu, Xc)]ieL, 


For a basic BDD-based algorithm, assume that the tran¬ 
sition relation T{L, Xu, Xc, L') and the sets of initial and 
unsafe states are each represented as a single BDD (see 
e.g. [^). To determine whether the environment has a 
strategy that guarantees its victory, one repeatedly com¬ 
putes the set of states from which it can force the game 
into the unsafe states. If S{L') is a formula over the 
latches L', representing a set of states, then the set of 
uncontrollable predecessors of S{L') can be computed as 
the set of valuations of latches L that satisfy 

UPRE(S'(L')) = yXc 3L'. S{L') AT{L,Xu,Xc,L'). 

To compute the winning region W{L) of the system 
player, we first compute the least fixpoint of UPRE on 
BAD: 

( 1 ) 


avoiding to ever build the monolithic transition relation, 
as well as having to ever declare a next state copy of any 
latch variable in the BDD manager. Substituting individ¬ 
ual latches with functional BDDs is directly supported 
by existing BDD packages, e.g., function bddVector- 
COMPOSE in the CUDD package [^. This will be called 
partitioned transition relation in the tool descriptions. 

A special case of this approach is to identify those 
latches that only store the value of some other variable 
from the last step, i.e., the latch update function has 
the form fi = x for some x G L U Xu U Xc, and use 
the substitution of latches by // only for them. In this 
case, we only substitute with a single existing variable 
instead of a functional BDD, which can be done, e.g., 
with CUDD’s bddVarMap method. This will be called 
direct substitution in the tool descriptions. 


yiSiL). UPRE(S'(L')VBAD') 
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BDD Reordering. Efficiency of BDD-based algorithms 
depends to a large extent on the size of BDDs, which 
in turn depends on the variable ordering j^. To keep 
the data structures small, reordering heuristics are com¬ 
monly used to try to minimize BDDs at runtime |57| . 
Standard BDD packages come with pre-defined reorder¬ 
ing strategies. Algorithms that do not use reordering at 
all are usually not competitive. 

Efficient Computation of UPRE. In the fixpoint com¬ 
putation, we repeatedly use the UPRE operation to com¬ 
pute the set from which the environment wins the game. 
This operation consists of conjoining the current set of 
states with the transition relation, followed by resolving 
the quantification over inputs and current states to get 
the description of the set of predecessor states. The latter 
is called (existential or universal) abstraction over these 
variables. In practice, it is often preferable to not use this 
strict order, but instead do conjunction and abstraction 
in parallel. This is directly supported in some BDD pack¬ 
ages, e.g., in CUDD’s bddAndAbstract method. We 
will call this optimization simultaneous conjunction and 
abstraction. 

Eager Deallocation of BDDs. Another optimization 
is to deallocate BDDs that are no longer needed as soon 
as possible. Not only do these BDDs take up memory, 
but more importantly they are also counted and ana¬ 
lyzed for BDD reordering. Thus, removing such BDDs 
saves space and time. We will call this eager deallocation. 

Abstraction-based algorithms. For systems that have 
a large state space, an abstraction approach may be more 
efficient than precise treatment of the full state space 


P (which may simply be a subset of the state variables 
of the system |26[|47| ) and computing over- and under¬ 
approximations of the UPRE function, with respect to 
the partitioning of the state space defined by the pred¬ 
icates in P. Computing fixpoints for these approxima¬ 
tions is usually much cheaper than computing the pre¬ 
cise fixpoint for the system. If the system player wins 
the game for the over-approximation of UPRE, then it 
also wins the original game. If the system player loses 
for the under-approximation of UPRE, then it also loses 
the original game. If neither is the case, then the abstrac¬ 
tion is insufficient and needs to be refined by introducing 
additional predicates. 

Extraction of Small Winning Strategies. To obtain 
from A a functional strategy that can be represented as 
a small circuit, a number of optimizations is commonly 
used. To this end, let be the restriction of A to one 
output c G Xc. We want to obtain a partitioned strategy, 
represented as one function fc{L,Xu) for every c € Xc'. 

— For every c S X^, in some arbitrary order, compute 
the positive and negative co-factors of Ac, i.e., the 
values s, (j„ for which Ac(s, (j„) can be 1 or 0, respec¬ 
tively. These can be used to uniquely define /c, e.g.. 


36l 3)^ . This can be done by defining a set of predicates 


by letting /c(s,(T„) = 0 for all values in the nega¬ 
tive co-factor, and fc{s,<Ju) = 1 otherwise 11 . This 


will be called co-factor-based extraction of winning 
strategies. 

After extracting the functions /c(L,A„) for all c S 
Ac, one can minimize the strategy by doing an ad¬ 
ditional forward reachability analysis', compute the 
reachable states with this strategy, and restricting all 
/c to values of L that are actually reachable. 

After translating the functions fc{L, A„) into an AIG 
representation, a number of minimization techniques 
can be used to obtain small AIGs (4^[50] . The verifi¬ 
cation tool ABC|^ implements a number of these 
minimization strategies that can be used in a black¬ 
box fashion to obtain smaller circuits, and we will 
call this approach ABC minimization. 


2.4 Incremental SAT- and QBE-based Game 
Solving 

In contrast to the BDD-based approaches already pre¬ 
sented, the SAT- and QBF-based approaches start with 
a coarse over-approximation of the winning region, rep¬ 
resented as a GNF formula W{L) over the state vari¬ 
ables L. This approximation is incrementally refined, 
such that W(L) eventually represents the winning re¬ 
gion symbolically. 

More concretely, we initialize W(L) to the set of all 
safe states ^BAD. In each iteration, the underlying solver 
is used to compute a state s \= W{L) A UPRE(^W(L')) 
within the current candidate version W(L) of the win¬ 
ning region from which the environment player can en¬ 
force to leave W{L) in one step. Obviously, such a state 
cannot be part of the winning region. Hence, we refine 
W{L) by removing this state. The state s can be repre¬ 
sented as a cube over the state variables L, so removing 
s from W{L) amounts to adding the clause ^s. 

In order to remove a larger region from W{L), the 
algorithm tries to generalize the clause —•s by remov¬ 
ing literals, as long as the resulting clause still only 
excludes states from which —W{L) can be reached by 
the environment in one step. More specifically, literals 
are dropped as long as {W{L) A ^s) —UPRE(^W(L')) 
holds. Once W{L) A UPRE(^W(L')) becomes unsatisfi- 
able, i.e., no more state exists from which the environ¬ 
ment can enforce to leave W (L), we have found the final 
winning region and the algorithm terminates. 

Implementation and Optimizations. A simple real¬ 
ization of this approach uses a QBF solver both to com¬ 
pute a state s and to generalize the induced blocking 
clause ^s. A generally more efficient approach is to use 
two competing SAT solvers for the two different quan¬ 
tifiers in UPRE when computing s. Other optimizations 

® http://www.eecs.berkeley.edu/~alaiimi/abc/ Accessed 
February 2016. 
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include the utilization of reachability information dur¬ 
ing the computation of s and during the generalization 
of A detailed description of different realizations and 
optimizations can be found in |13| . 

Extraction of Small Winning Strategies. To obtain 
an implementation from the winning region, different 
methods can be applied. One possibility is to compute a 
certificate for the validity of the QBF formula 

VL, A„ L\ W{L) ^ (T(L, A„ L') A W{L')) 


in the form of functions defining the variables in 
based on L and using methods for QBF certifica¬ 
tion |52| . Another option are learning-based approaches 
that have also been proposed for BDD-based synthe¬ 
sis, but work particularly well in a SAT-/QBF-based 
framework |9 30 . Similar to the BDD-based methods 
for extracting small winning strategies, these learning 
approaches also compute solutions for one output c G Xc 
at a time. They start with a first guess of a concrete out¬ 
put function, and then refine this guess based on coun¬ 
terexamples. 


2.5 Template-based Synthesis 

In order to compute a winning region, symbolically rep¬ 
resented as a formula W{L) over the state variables L, 
this approach constructs a parameterized CNF formula 
W{L,P), where P is a certain set of Boolean template 
parameters. Different concrete values for these param¬ 
eters P induce a different concrete CNF formula W(L) 
over the state variables. This is done as follows. First, the 
approach fixes a maximum number N of clauses. Then, 
for every clause and every state variable, it introduces 
parameters defining whether the state variable occurs in 
the clause, whether it occurs negated or unnegated, and 
whether the clause is used at all. This way, the search for 
a CNF formula over the state variables (the winning re¬ 
gion W{L)) is reduced to a search for Boolean constants 
(values for the template parameters P). A QBF solver 
is used to compute template parameter values such that 
(a) the winning region contains only safe states, (b) the 
winning region contains the initial state, and (c) from 
each state of the winning region, the system player can 
guarantee that the successor state will also be in the win¬ 
ning region, regardless of the choice of the environment. 
This is done by computing a satisfying assignment for 
the variables P in QBF: 

3P VL,A, 3A„P'. 

W{L,P) -> ^BAD A 

Init(P) ^ W{L, P) A 

VF(P, P) ^ (P(P, X„, Ae, L') A W{L', P)). 
More details can be found in [I^. 


3 Benchmark Format 


For the first SYNTCOMP, we have chosen to use an 
extension of the AIGER format that is already used in 
automatic verification and is suitable for our selected 
range of problems, as well as extendable to other classes 
of problems. Furthermore, the format poses a low entry- 
barrier for developers of synthesis tools, as synthesis 
problems are directly given in a bit-level representation 
that can easily be encoded into BDDs and SAT-based 
approaches. In the following, we first recapitulate the 
AIGER formaij^ defined by Biere as the specification 
language for the hardware model checking competition 
(HWMCC)|^ Then we show an extension of AIGER to 
a specification format for synthesis problems with safety 
specifications, developed for SYNTCOMP. Finally, we 
define how to use the AIGER format for solutions of 
synthesis problems in this setting. 

3.1 Original AIGER Format 

The AIGER format was developed as a compact and 
simple file format to represent benchmarks for the hard¬ 
ware model checking competition (HWMCC). Bench¬ 
marks are encoded as multi-rooted And-Inverter Graphs 
(AIGs) with latches that store the system state. We 
use version 20071012 of the format. There is an ASCII 
variant and a more compact binary variant of the for¬ 
mat. Since the binary format is more restricted and thus 
harder to extend than the ASCII format, we have chosen 
to work with the ASCII variant for SYNTCOMP. In the 
following, we explain the structure of AIGER files for 
model checking of safety properties. 

A file in AIGER format (ASCII variant) consists of 
the following parts: 

1. Header, 

2. Input definitions, 

3. Latch definitions, 

4. Output definitions, 

5. AND-gate definitions, 

6. Symbol table (optional), and 

7. Comments (optional) 

Header. The header consists of a single line 
aag M I L 0 A 

where aag is an identifier for the ASCII variant of the 
AIGER format, M gives the maximum variable index, and 
I, L, 0, A the number of inputs, latches, outputs, and 
AND gates, respectively. 

In the rest of the specification, each input, latch, out¬ 
put, and AND gate is assigned a variable index i. To 
support negation, variable indices i are even numbers, 
and the negation of a variable can be referred to as f 3-1. 

^ http://fmv.jku.at/aiger/ Accessed February 2016. 

® http://fmv.jku.at/hwmcc/ Accessed February 2016. 
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Variable index 0 is reserved for the constant truth value 
false, and accordingly 1 refers to true. In the follow¬ 
ing, all numbers that represent inputs, latches, outputs 
or AND-gates need to be smaller or equal to 2M+1. 

Input definitions. Every input definition takes one 
line, and consists of a single number (the variable in¬ 
dex of the input). Inputs are never directly negated, so 
they are always represented by even numbers. 

Latch definitions. Every latch definition takes one line, 
and consists of an even number (the variable index that 
represents the latch), followed by a number that defines 
which variable is used to update the latch in every step. 
Latches are assumed to have initial value 0. 

Output definitions. Every output definition takes one 
line, and consists of a single number (representing a pos¬ 
sibly negated input, latch, or AND-gate). For our class of 
(safety) problems, there is exactly one output, and safety 
conditions are encoded such that the circuit is safe if the 
output is always 0. 

AND-gate definitions. Every AND-gate definition takes 
one line, and consists of three numbers. The first is an 
even number, representing the output of the AND-gate, 
and is followed by two numbers representing its (possibly 
negated) inputs. 

Symbol table. The symbol table assigns names to in¬ 
puts, latches, and outputs. It is optional, and need not 
be complete. Every line defines the name of one input, 
latch, or output, and starts with i, 1, o, respectively, fol¬ 
lowed by the number of the input, latch, or output in the 
sequence of definitions (not the variable index of the in¬ 
put - so the first input is always defined by a line starting 
with iO, the first latch with 10). This is followed by an 
arbitrary string that names the variable. 

3.2 Modified AIGER format for synthesis 
specifications 

The SYNTCOMP format is a simple extension of the 
AIGER format for controller synthesis: we reserve the 
special string “controllable_” in the symbol table, and 
prepend it to the names of controllable input variables. 
All other input variables are implicitly uncontrollable. 

The synthesis problem defined by an extended AIGER 
file is to find a circuit that supplies valuations for the 
controllable inputs, based on valuations of uncontrol¬ 
lable inputs and latches of the given circuit, such that 
the output always remains 0. 

3.3 Output of synthesis tools in AIGER format 

Starting from an input as defined in Section |3.2[ we de¬ 
fine when an AIGER file is a solution of the synthe¬ 
sis problem. Informally, the solution must contain the 
specification circuit, and must be verifiable by existing 


model checkers that support the AIGER format. We give 
a more detailed definition in the following. 

3.3.1 Syntactic correctness 

Below we define how the input file can be changed in 
order to obtain a syntactically correct solution. Unless 
specified otherwise below, the output file must contain 
all lines of the input file, unmodified and in the same 
order. 

Header. The original header line 
aag M I L 0 A 
must be modified to 
aag M’ I’ L’ 0 A’ 
where 

- I’ = I - c 

(for c controllable inputs in the specification) 

- L’ = L + I 

(for I additional latches defined in the controller) 

- A’ = A + a 

(for a additional AND-gates defined in the controller) 

- M ’ = I ’ -b L ’ + A ’ 

The correct value for c can be computed from the 
symbol table of the input file, while correct values for I 
and a depend on the number of latches and AND-gates 
in the solution. 

Inputs. Definitions for uncontrollable inputs remain un¬ 
changed. Definitions for controllable inputs are removed, 
and the corresponding variable indices have to be rede¬ 
fined either as new latches or AND-gates (see below). 

Latches. No definitions of latches may be removed, but 
additional latches may be defined in the lines following 
the original latches. 

Outputs. No definitions of outputs may be removed, no 
additional outputs may be defined. 

AND-gates. No definitions of AND-gates may be re¬ 
moved, but additional AND-gates may be defined in the 
lines following the original AND-gates. 

Global restrictions. All variable indices of controllable 
inputs have to be redefined exactly once, either as a new 
latch or as a new AND-gate. New latches and AND-gates 
may be defined using the remaining (uncontrollable) in¬ 
puts, any latches, or newly defined AND-gates, but not 
original AND-gatesj^ 

Symbol table and Comments. The symbol table re¬ 
mains unchanged. Gomments may be removed or modi¬ 
fied at will. 

® The reason for disallowing original AND-gates is that we want 
the controller to work only based on the state of the given circuit 
(i.e., values of latches), and the uncontrollable inputs. Original 
AND-gates can be duplicated in the controller if necessary. 
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3.3.2 Semantic Correctness 

All input files will have the same structure as single 
safety property specifications used in HWMCC. In par¬ 
ticular, this means that there is only one output, and 
the system is safe if and only if this output remains 0 for 
any possible input sequence. 

Any output file satisfying the syntactical restrictions 
described in Section 13.3.11 is an AIGER file. It is eorrect 
if for any input sequence (of the uncontrollable inputs), 
the output always remains 0. We say that it is a solution 
to the synthesis problem defined by the input file if it is 
successfully model checked by an AIGER-capable model 
checker within a determined time bound. 

4 Benchmarks 

The benchmark set for the first SYNTGOMP consisted 
of 569 benchmark problems overall, out of which 390 
are realizable and 179 unrealizabledMost of the bench¬ 
marks existed before in other formats, and have been 
translated to our new format. The full set of bench¬ 
marks used in SYNTGOMP 2014 is available in directory 
BenchmEirks2014 of our public Git repository at https; 
//bitbucket.org/swenjacobs/syntcomp/. In the fol¬ 
lowing, we first explain how benchmarks have been col¬ 
lected, translated and tested, and then describe the dif¬ 
ferent sets of benchmarks. 

4.1 Collection of Benchmarks 

One of the major challenges for the first SYNTGOMP 
was the collection of benchmarks in the extended AIGER 
format. Following the decision to use this format, a call 
for benchmarks was sent to the synthesis community. 
Many synthesis tools have their own benchmark set, but 
none of them previously used the SYNTGOMP format, 
and therefore such benchmarks had to be translated. 
Since we restrict to safety specifications currently, such 
a translation usually involves a safe approximation of 
liveness by safety properties, and results in a family of 
benchmark instances for different precision of the ap¬ 
proximation. 

Generation and Translation of Benchmarks. One 

method for obtaining benchmarks in AIGER format is 
based on a translation from LTL specifications, together 
with a reduction to a bounded synthesis problem, as used 
in AcaciaTp] [l4p^ . The idea is to 1) translate the nega¬ 
tion of the LTL formula into a universal co-Biichi au¬ 
tomaton, 2) strengthen this automaton into a universal 

^ Numbers regarding realizability are to the best of our knowl¬ 
edge. The realizability status has not been verified for all bench¬ 
mark instances. 

® http://lit2.ulb.ac.be/acaciaplus/ Accessed February 
2016 


fc-co-Biichi automaton that accepts a word w if and only 
if all the runs on w visit rejecting states at most k times 
— such an automaton defines a safety objective and can 
be easily made deterministic. 3) Finally, a safety game is 
obtained by encoding succinctly this deterministic safety 
automaton as an AIGER specification. We thus obtain a 
family of benchmark instances, one for each valuation of 
k. If the original LTL specification is realizable, then the 
resulting benchmark instance will be realizable for suffi¬ 
ciently large k. This translation from LTL to AIGER has 
been implemented by Guillermo A. Perez in the ltl2aig 
routing 

Another successful way of obtaining benchmarks was 
to start from Verilog code, and use a toolchain composed 
of the vl2mv routine of the VIS systeir p^ [T^ , followed by 
translation to AIGER (and optimization) by ABC p^fl?] , 
and from binary AIGER format to ASGII format by the 
aigtoaig routine from the AIGER tool seip^ Liveness 
properties can be approximated by safety properties, and 
we obtain a family of benchmark instances for different 
approximations. Such an approximation is explained in 
more detail in Section |4.3| This approach will be called 
the Verilog toolchain below. 

Finally, a number of benchmarks have been obtained 
by translation from structured specifications for the gen¬ 
eralized reactivity(l) game solver SLUG^^ The term 
“structured” in this context refers to support for con¬ 
straints over (non-negative) integer numbers, which are 
automatically compiled into a purely Boolean form. The 
purely Boolean generalized reactivity(l) safety specifi¬ 
cation is then translated into a monitor automaton in 
AIGER format, which is finally optimized using the ABG 
toolset by applying the command sequence rewrite. We 
will call this approach the SLUGS toolchain below. 

Testing and Classification of Benchmarks. To test 
the resulting benchmarks, we fed them to our reference 
implementation Aisjj^ and compared the produced so¬ 
lution to the expected result. Since our reference imple¬ 
mentation is not as efficient as the participants of the 
competition, a significant number of benchmarks was 
only solved during the competition, but not in our initial 
tests. Those that were not solved were classified into re¬ 
alizable and unrealizable according to informed guesses 
of the benchmark authors. During the competition, this 
resulted in 3 problem instances being re-classified from 
unrealizable to realizable, or vice versa. 


^ https://github.com/gaperez64/acacia_ltl2aig 

Accessed 

February 2016. 


http://vlsi.colorado.edu/~vis/ Accessed February 2016. 

http: //www. eecs. berkeley. edu/~alaiiini/abc/ 

Accessed 

February 2016. 



http://fmv.jku.at/aiger/ Accessed February 2016. 


http://github.com/ltlmop/slugs Accessed February 2016. 
https://bitbucket.org/art_haali/aisy-classroom Ac¬ 
cessed February 2016. 
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4.2 Toy Examples 

These benchmarks are based on original Verilog specifi¬ 
cations that have been translated to AIGER nsing the 
Verilog toolchain. The set includes specifications of basic 
circuits like adders, bit shifters, multipliers, and coun¬ 
ters. Additionally, it contains some specifications with 
typically very simple properties, e.g., that outputs must 
match inputs, or that the XOR of inputs and outputs 
must satisfy some property. All examples are parame¬ 
terized in the bit-width of the controllable and uncon¬ 
trollable inputs, ranging between 2 and 128 bits on some 
examples, and for each example there are two versions, 
using the optimizing and non-optimizing translation by 
ABC, respectively. Overall, this set contains 138 bench¬ 
marks. 

All AIGER files contain the original Verilog code, as 
well as the commands used to produce the AIGER file, 
in the comments section. This set of benchmarks was 
provided by Robert Konighofer. 

4.3 Generalized Buffer 

The well-known specification of an industrial generalized 
buffer was developed by IBM and subsequently used as 
a synthesis benchmark for Anzu and other tools. It 
is parameterized by the number of senders which send 
data to two receivers. The buffer has a handshake proto¬ 
col with each sender and each receiver. A complete Gen- 
buf consists of a controller, a FIFO, and a multiplexer. 
In this benchmark, the FIFO and multiplexer are con¬ 
sidered as part of the environment, and the controller is 
synthesized. As a synthesis case study for Anzu, it has 
been explained in detail by Bloem et al. m- Robert 
Konighofer translated these benchmarks to AIGER, as 
explained in the following. 

Liveness-to-Safety Translation. For Anzu, the Genbuf 
benchmark contains Biichi assumptions {Ai,...,Am} 
that are satisfied if all state sets Ai are visited infinitely 
often, and Biichi guarantees {Gi,..., Gn} requiring that 
all Gj are visited infinitely often if all assumptions are 
satisfied. Three different translations into safety specifi¬ 
cations were performed. Translation “c” (for counting) 
applies the well-known counting construction: A modu¬ 
lar counter i G {0,..., m} stores the index of the next 
assumption. If an accepting state s G Ai of this next as¬ 
sumption is visited, the counter is incremented modulo 
TO -I- 1. If i has the special value 0, it is always incre¬ 
mented. The same counting construction is applied to 
the Biichi guarantees with counter j G Fi¬ 

nally, a third counter r is used to enforce a minimum 
ratio between the progress in satisfying guarantees and 
assumptions: Whenever j is incremented, r is reset to 
0. Otherwise, if i = 0, then r is incremented. If r ever 
exceeds some bound k, then BAD is set. A controller 
enforcing that BAD cannot become 1 thus also enforces 


that all Gj are visited infinitely often if all Ai are visited 
infinitely often. Translation “b” (for bitwise) is similar 
but uses one bit per assumption and guarantee instead 
of a counter. It thus avoids imposing an (artificial) order 
between properties. Translation “f” (for full set) is sim¬ 
ilar to “b” but resets r only if all guarantees have been 
seen in a row (rather than only the next one). 

Translation to AIGER. Anzu comes with scripts to con¬ 
struct Genbuf benchmark instances with different num¬ 
bers of senders. These scripts were modified to output a 
Verilog representation, and from there the Verilog tool- 
chain was used to obtain benchmarks in AIGER format. 
The final specification is parameterized in 1) the number 
of senders which send data, 2) the type (c, b or f) and 
the bound k of the liveness-to-safety translation, and 
3) whether or not ABG optimizations are used in the 
translation. All AIGER files contain the original Verilog 
code (which in turn contains the Anzu specification it 
was created from), as well as the commands used to pro¬ 
duce the AIGER file, in the comments section. Overall, 
this set contains 192 benchmark instances. 


4.4 AMBA Bus Controller 


This is a specification of an arbiter for the AMBA AHB 
bus, based on an industrial specification by ARM. Like 
the Genbuf case study, it has been used as a synthesis 
benchmark for Anzu and other tools. It is parameter¬ 
ized with the number of masters that can access the bus 
and have to be coordinated by the arbiter. The AMBA 
AHB bus allows masters to request different kinds of bus 
accesses, either as a single transfer or as a burst, where 
a burst can consist of either a specified or an unspecified 
number of transfers. Besides correct modeling of these 
different forms of accesses, the specification requires re¬ 
sponses to all requests (that are not eventually lowered), 
as well as mutual exclusion of bus accesses by different 
masters. As a synthesis case study for Anzu, it has been 
explained in detail by Bloem et al. [lO] . 

The Anzu specification has been translated by Ro¬ 
bert Konighofer to AIGER format using the Verilog 
toolchain in the same way as for the Genbuf benchmark. 
Instances are parameterized in the number of masters, 
and (as for Genbuf) the type (c, b or f) and the bound 
k of the liveness-to-safety translation, as well as whether 
or not ABG optimizations are used in the translation. 
All AIGER files contain the original Verilog code (which 
in turn contains the Anzu specification it was created 
from), as well as the commands used to produce the 
AIGER file, in the comments section. Overall, this set 
contains 108 benchmarks. 
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4.5 LTL2AIG Benchmarks 


This set contains several benchmarks provided in the 
Acacia+ tool package [^, translated into AIGER for¬ 
mat using the ltl2aig routine. The set includes: 

— 50 benchmarks from the test suite included with the 
synthesis tool Lil\{^ 44 , with specifications of traf¬ 
fic lights and arbiters in different complexity (25 orig¬ 
inal examples, each with 2 different choices of k). 

— 4 versions of the Genbuf case study, but in a much 
more challenging form than the specification men¬ 
tioned in Section 4.f ® This version is only speci¬ 
fied for 2 senders and 2 receivers, and for 4 different 
choices of k. 

— 5 versions of a load balancer case study, originally 
presented with synthesis tool UNBEASoFn 


— 23 benchmarks that use the synthesis tool to obtain 
a deterministic Biichi automaton for the given LTL 
specification (if possible), and 

— 18 benchmarks for a similar conversion from LTL to 
deterministic parity automata. The latter two con¬ 
versions are mentioned as applications of synthesis 
procedures by Kupferman and Vardi |46| . 

Overall, this set contains 100 benchmarks. 


This specification has been translated by Rudiger 
Ehlers from original benchmarks for the SLUGS GR(1) 
synthesis tool, using the SLUGS toolchain. Overall, this 
set contains 15 benchmarks, for different values of m (3 
to 7), n (3 to 6), k (1 to 2) and c (0 to 11). 

4.7 Moving Obstacle Evasion 

This benchmark models a controller for a robot that 
moves on a quadratic grid of parametric size m, and 
has to avoid colliding with a moving obstacle (of size 
2x2 grids). In any time step, the robot and the obstacle 
can only move by at most one grid in x and y direction. 
Additionally, the obstacle can usually only move at most 
every second time step. However, as in the assembly line 
benchmarks, there may be a fixed number c of glitches in 
an execution of the system, which in this case means that 
the obstacle moves even though it has already moved in 
the immediately preceding time step. 

This specification has been translated by Rudiger 
Ehlers from original benchmarks for the SLUGS GR(1) 
synthesis tool, using the SLUGS toolchain. Overall, this 
set contains 16 benchmarks, for different values of m (8 
to 128) and c (0 to 60). 


4.6 Factory Assembly Line 

This benchmark models an assembly line with multi¬ 
ple tasks that need to be performed on the objects on 
the conveyor belt. The specification models a number of 
robot arms (fixed to 2), a number n of objects on the 
conveyor belt, and a number m of tasks that may have 
to be performed on each object before it leaves the area 
that is reachable by the arms. The belt moves after a 
fixed number k of time steps, pushing all objects forward 
by one place, and the first object moves out of reach of 
the arms (while a new object enters at the other end 
of the belt). The arms are modeled such that they can¬ 
not occupy the space above the same object on the belt, 
and can move by at most one position per time step. In 
particular, this means that they cannot pass each other. 
Whenever an arm is in the same position as an object 
that has unfinished tasks, it can perform one task on the 
object in one time step. Usually, the assumption is that 
at most m— 1 of the m tasks need to be performed on any 
object, but there may be a fixed number c of glitches in 
an execution of the system, which means that an object 
with m open tasks is pushed onto the belt. 

http://www.iaik.tugraz.at/content/research/ 
opensource/lily/ Accessed February 2016. 

We conjecture that this version is more challenging because 
it is based on a large LTL specification, which is translated to a 
single, very big Buchi automaton in the first step of the ltl2aig 
routine. This results in a circuit that is much more complex than 
the ones from Section |4.3| 

http://www.react.uni-saarland.de/tools/unbeast/ Ac¬ 
cessed February 2016. 


5 Rules 

The rules for SYNTGOMP were inspired by similar com¬ 
petitions such as the SAT competition and the HWMGG. 
The basic idea is that submitted tools are evaluated on 
a previously unknown set of benchmarks, without user 
intervention. A simple ranking of tools can be obtained 
by checking only the correctness of solutions, and count¬ 
ing the number of problem instances that can be solved 
within a given timeout. However, the goal of synthesis 
is to obtain implementations that are not only correct, 
but also efficient. Therefore, we also considered refined 
rankings based on the quality of the produced solutions, 
measured by the size of the implementation. 

Tracks. The competition was separated into two tracks: 
the realizability track which only required a binary an¬ 
swer to the question whether or not there exists a circuit 
that satisfies the given specification, and the synthesis 
track which was only run on realizable benchmarks, and 
asked for a circuit that implements the given specifica¬ 
tion. The motivation for this split was again to have a 
low entry-barrier for tool creators, as an efficient realiz¬ 
ability checker can be implemented with less effort than 
a full synthesis tool that produces solutions and opti¬ 
mizes them for size. Indeed, 2 out of 5 submitted tools 
make use of this split and only supply a realizability 
checker, and these two tools solve more problems in the 
realizability track than any of the full synthesis tools. 

Subtracks. Each track was divided into a sequential 
subtrack, where tools can use only one core of the GPU, 
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and a parallel subtrack, allowing tools to use multiple 
cores in parallel. The decision to have both sequential 
and parallel execution modes was based on the expec¬ 
tation that parallelization would often be trivial—i.e., 
a number of different but largely independent strategies 
running in parallel]^ Therefore, we also wanted to eval¬ 
uate tools in sequential execution mode in order to mea¬ 
sure and identify the single best strategy. 

5.1 Entrants 

We asked for synthesis tools to be supplied in source 
code, licensed for research purposes, and we offered to 
discuss possible solutions if this restriction was a prob¬ 
lem to any prospective participant. This was not the 
case for any of the research groups that contacted us 
regarding the competition. The organizers reserved the 
right to submit their own tools and did so in the form of 
Basil, implemented by R. Ehlers, and Demiurge, imple¬ 
mented in part in the research group of R. Bloem. We 
encouraged participants to visit SYNT and CAV for the 
presentation of the SYNTCOMP results, but this was 
not a requirement for participation. 

We allowed up to 3 submissions per author and sub¬ 
track, where submissions are considered to be different 
if source code, compilation options or command line ar¬ 
guments are different. This limit was chosen to allow 
some flexibility for the tool creators, while avoiding the 
flooding of the competition with too many different con¬ 
figurations of the same tool. All tools must support the 
input and output format of SYNTCOMP, as defined in 
Section Additionally, each entrant to SYNTCOMP 
was required to include a short system description. 

The organizers commited to making reasonable ef¬ 
forts to install each tool but reserved the right to reject 
entrants where installation problems persisted. This was 
not the case for any of the entrants. Furthermore, in 
case of crashes or wrong results we allowed submission 
of bugfixes if possible within time limits. In one case, a 
bugfix was submitted that resolved a number of solver 
crashes that only appeared during the competition runs. 

5.2 Ranking 

In both the realizability and the synthesis track, com¬ 
petition entrants were ranked with respect to the num¬ 
ber of solved problems. Additionally, we consider a more 
fine-grained relative ranking that distributes points for 
each benchmark according to the relative performance 
of tools, measured either in the time needed to And a 
solution, or the size of the solution. A drawback of this 

In particular, non-trivial parallelization is difficult for BDD- 
based tools, since none of the existing parallel BDD packages sup¬ 
ports all features needed for the optimizations mentioned in Sec¬ 
tion |2^ 


relative ranking is that it does not allow easy compari¬ 
son to tools that did not participate. As an alternative 
that resolves this problem, we additionally give a quality 
ranking for the synthesis track that compares the size of 
the provided solution to a reference sizej^ 

For all rankings, a timeout (or no answer) gives 0 
points. A punishment for wrong answers was not neces¬ 
sary, since the full set of benchmarks was made available 
to the participants one month before the submission of 
solvers. 

Correctness and Ranking in Realizability Track. 

For the realizability track, the organizers and benchmark 
authors took responsibility for determining in advance 
whether specifications are realizable or unrealizable, by 
using knowledge about how the benchmarks were gener¬ 
ated. When in doubt, a majority vote between all tools 
that solved a given benchmark was used to determine 
the correct outcome 0 

In addition to a ranking based on the number of 
solved problem instances, tools were evaluated with a 
relative ranking based on the time needed to come to 
the solution, where the tool with the smallest time earns 
the highest rank (see below). For the sequential subtrack, 
tools were ranked with respect to CPU time, while for 
the parallel subtrack we ranked tools with respect to 
wall-clock time. 

Correctness and Ranking in Synthesis Track. In 

the synthesis track, correctness of solutions was assessed 
by checking both syntactical and semantical correctness. 
Syntactical correctness means conformance to our out¬ 
put format defined in Section which was checked by 
a separate syntax checker. Semantical correctness was 
tested by a model checker (iimcp^ based on the ICS al¬ 
gorithm |15| ), which had to terminate within a separate 
time bound for the result to be considered correct. As 
in the realizability track, there is a ranking with respect 
to the number of solved problem instances, as well as a 
relative ranking. The latter is in this case based on the 
size of solutions, given by the number of AND-gates in 
the resulting circuit. In addition, we provide a quality 
ranking that awards points for every solution, based on 
a comparison of the solution size to a reference size (see 
below). 

Relative Ranking. For every benchmark, all tools that 
provide a correct solution are ranked with respect to the 
metric (time or size), and each tool obtains points based 
on its rank. In detail: if k benchmarks are used in the 

The quality ranking was devised for the second SYNTCOMP 
and was applied to the results of the first competition only after 
the presentation of results at SYNT and CAV 2014. 

This rule only had to be used in one instance, where a bench¬ 
mark was solved by only one tool, and was reported to be realizable 
although unrealizable was the expected outcome. In our analysis it 
turned out that the tool was correct, and the initial classification 
as unrealizable was wrong. 

http://ecee.colorado.edu/wpmu/iimc/. Accessed February 
2016. 
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track, then lOOO/fc = p points are awarded per bench¬ 
mark. If n tools solve the benchmark, then the points 
for that benchmark are divided into = / frac¬ 
tions, and the tool which is at rank m will get -p 

points for this benchmark. 

Quality Ranking. In the quality ranking, solutions are 
awarded points depending on the size sizcnew of the so¬ 
lution and a reference size sizcref- The number of points 
for a solution is 

/ SiZCjiQyj \ 

2 - logio ^- • 

\ SlZCref ) 

That is, a solution that is of size sizcref gets 2 points; a 
solution that is bigger by a factor of 10 gets 1 point; a 
solution that is bigger by a factor of 100 (or more) gets 0 
points; and similarly for solutions that are smaller than 
siZCref- 

Since for the first competition we do not have refer¬ 
ence solutions for any of the problem instances, we use 
the smallest size of any of the solutions of this competi¬ 
tion as the reference size. In future competitions, or for 
comparison of tools that did not participate, the size of 
the smallest solution that has been provided in any of 
the competitions before can be used. 


6 Participants 


Five systems were entered into the first SYNTCOMP. In 
the following, we give a brief description of the methods 
implemented in each of these systems. For the BDD- 
based tools. Table shows which of the optimizations 
from Section |2.3| are implemented in which tool. 


6.1 AbsSynthe: an abstract synthesis tool 


AbsSynthe was submitted by R. Brenguier, G. A. Perez, 
J.-F. Raskin, and O. Sankur from Universite Libre de 
Bruxelles. AbsSynthe implements a BDD-based synthe¬ 
sis approach and competed in all subtracks. 


Synthesis algorithms. AbsSynthe implements differ¬ 
ent BDD-based synthesis algorithms, with and without 
abstraction, described in more detail in [^. All algo¬ 
rithms use the BDD package CUDD 60 , with automatic 
BDD reordering using the sifting heuristic. 

The concrete algorithm with partitioned transition 
relation (C-TL) implements BDD-based synthesis with 
partitioned transition relation and direct substitution of 
state variables with BDDs. In addition, when comput¬ 
ing UPRE(S'(L')), then the transition functions fi of all 
latches are first restricted to ^S{L), effectively only com¬ 
puting the uncontrollable predecessors which are not al¬ 
ready in S{L). These new states are then joined to S{L), 
which gives the same result as the standard UPRE com¬ 
putation. 


The basic abstract algorithm (A) implements synthe¬ 
sis with a precomputed (monolithic) abstract transition 
relation, and some additional optimizations. 

The alternative abstract algorithm (A-TL) avoids us¬ 
ing a precomputed transition relation by implementing 
abstract operators for post-state computation. 

AbsSynthe was intended to compete in these differ¬ 
ent configurations. However, due to a miscommunica- 
tion between tool authors and competition organizers, 
the necessary command line parameters were not used, 
such that only one configuration participated, namely 
(C-TL). Unfortunately, this error was discovered too late 
to run the additional configurations before the presenta¬ 
tion of results at CAV 2014. 

However, as mentioned in , the abstraction-based 
methods overall performed worse than the concrete al¬ 
gorithm (C-TL), and thus the fastest configuration did 
participate in the competition. 

Strategy extraction. Strategy extraction in AbsSyn¬ 
the uses the co-factor-based approach described in Sec¬ 
tion |2.3| including the additional forward reachability 
check. When extracting the circuit, every AIG node con¬ 
structed from the BDD representation is cached in order 
to avoid duplicating parts of the circuit. 

Implementation, availability. AbsSynthe is written 
mostly in Python, and depends only on a simple AIG 
library (fetched from the AIGER toolbosj^ and the 
BDD package CUDE0 The source code is available at 
https;//github.com/gaperez64/AbsSynthe 


6.2 Basil: BDD-based safety synthesis tool 


Basil was submitted by R. Ehlers from the University 
of Bremen, and implements a BDD-based synthesis ap¬ 
proach. Basil competed in all subtracks. 

Synthesis algorithm. Basil implements a BDD-based 
synthesis algorithm, based on the BDD package CUDD. 
It uses automatic reordering of BDDs with the sifting 
heuristic, reconfigured in order to optimize more greed¬ 
ily. In contrast to all other BDD-based tools in the com¬ 
petition, it does not use a partitioned transition relation. 
It does however use a technique similar to direct substi¬ 
tution, regarding latches that are always updated by the 
value of an input variable: BDD variables that represent 
such inputs are double-booked as both an input and a 
post-state variable of the latch, and therefore need not 
be explicitly encoded into the transition relation. Addi¬ 
tionally, when building the transition relation it eagerly 
deletes BDDs that are only used as intermediate values 
as soon as they are not needed anymore. This is the case 


http://fmv.jku.at/aiger/ Accessed February 2016. 
http://vlsi.colorado.edu/~fabio/CUDD/ Accessed Febru¬ 
ary 2016. 
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Table 1. Optimizations implemented in BDD-based Tools. 


Technique 

AbsSynthe 

Basil 

Realizer 

Simple BDD Solver 

automatic reordering 

X 

X 

X 

X 

eager deallocation of BDDs 


X 


X 

direct substitution 

X 

(x) 

X 

X 

partitioned transition relation 

X 


X 

X 

simultaneous conjunction and abstraction 




X 

co-factor based extraction of winning strategies 

X 

X 

N/A 

N/A 

forward reachability analysis 

X 

X 

N/A 

N/A 

ABC minimization 


X 

N/A 

N/A 

additional optimizations (see tool descriptions) 

X 

X 

X 

X 


if a gate A is not used as a controllable input or an in¬ 
put to a latch, and all nodes that depend on A have been 
processed. 

Strategy extraction. Basil computes strategies with 
the co-factor-based approach from Section |2.3| including 
forward reachability analysis and ABC minimization. 

As an additional optimization, during strategy ex¬ 
traction the output bit BDDs are reduced in size by ap¬ 
plying LICompaction |38| : A joint BDD for all output bit 
BDDs is built and then, in a round-robin fashion over the 
outputs, the size of the joint BDD is reduced by chang¬ 
ing a part of it that describes the behavior of a single 
output bit in a way that makes the overall BDD smaller, 
but yields behavior that is contained in the most general 
strategy for winning the game. In order to minimize the 
care set for this operation, a reachable-state computa¬ 
tion is performed before every step. When no further size 
reduction is found to be possible, or some timeout has 
been reached, optimization by LICompaction is aborted. 

Implementation, availability. Basil is implemented 
in C++ and depends on the BDD package CUDD, as 
well as (optionally) ABCg for strategy minimization. It 
is currently not publicly available. 


6.3 Demiurge 


Demiurge was submitted by R. Konighofer from Graz 
University of Technology and M. Seidl from Johannes- 
Kepler-University Linz. Demiurge implements incremen¬ 
tal SAT- and QBF-based synthesis as described in Sec¬ 
tion |2.4| as well as template-based synthesis with QBF 
solving as described in Section [2.5| Demiurge competed 
in all subtracks. 


Synthesis algorithms. Demiurge implements different 
synthesis algorithms in different back-ends, described in 


more detail in 13 


The learning-based back-end uses the incremental 
synthesis approach to compute a winning region based 
on two competing SAT solvers to compute and generalize 
states to be removed from the winning region (algorithm 


http://www.eecs.berkeley.edu/~alcLmni/abc/ Accessed 
February 2016. 


LearnSat from with optimization RG enabled, but 
optimization RC disabled). Minisat version 2.2.0 is used 
as underlying SAT solver. 

The parallel back-end implements the same method 
with three threads refining the winning region in paral¬ 
lel. Two threads perform the work of the learning-based 
back-end, one using Minisat 2.2.0 and the other using Lin- 
geling ats. The third thread generalizes existing clauses 
of the winning region further by trying to drop more lit¬ 
erals. Using different solvers in the threads is beneficial 
because the solvers can complement each other, some¬ 
times yielding a super-linear speedup |13] . 

The template-based back-end uses a QBF solver to 
compute a winning region as instantiation of a tem¬ 
plate for a CNF formula over the state variables. For 
SYNTCOMP, DepQBF 3.02 is used as QBF solver via 
its API. Bloqqer, extended to preserve satisfying assign¬ 
ments [^, is used as QBF preprocessor. 

Demiurge contains more back-ends that are either ex¬ 
perimental or did not turn out to be particularly compet¬ 
itive, and therefore did not enter the competition. This 
includes a re-implementation of the technique of Mor- 
genstern et al. |51| , and an approach based on a reduc¬ 
tion to Effectively Propositional Logic (EPR). Details 
can be found in |13| . 

Strategy extraction. Demiurge provides several meth¬ 
ods for computing strategies from the winning region. 
The algorithm used in the competition uses a compu¬ 
tational learning approach as proposed in |30| , but im¬ 
plemented with incremental SAT solving or incremental 
QBF solving instead of BDDs. In terms of j^, it uses 
the SAT-based learning method without the dependency 
optimization, with Lingeling ats as SAT solver. ABC min¬ 
imization is used in a post-processing step. 

Implementation, availability. Demiurge is implemen¬ 
ted in CH—b, and depends on a number of underlying 
reasoning engines, some of them mentioned above. Be¬ 
cause of its modular architecture, Demiurge is easily ex¬ 
tendable with new algorithms and optimizations (cf. d): 
thus providing a framework for implementing new syn¬ 
thesis algorithms and reducing the entry barrier for new 
research on SAT- and QBF-based synthesis algorithms 
and optimizations. 
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Demiurge is available under the GNU LGPL license 
(version 3) at http://www.iaik.tugraz.at/content/ 
research/design_verification/demiurge/ 

6.4 REALIZER— CUDD Based Safety Game 
Solver 

Realizer was submitted by L. Tentrup from Saarland 
University, Saarbriicken. Realizer implements BDD- 
based realizability checking, and competed in both re¬ 
alizability subtracks. It does not support extraction of 
strategies. 

Synthesis algorithms. Realizer is based on BDD pack¬ 
age GUDD, and uses automatic reordering of BDDs with 
the lazy sift reordering scheme. When building the BDDs 
that represent the transition relation, it uses a tempo¬ 
rary hash table to save the BDDs for AND gates in the 
AIG. Before starting the fix-point algorithm, it builds 
the basic data structures used in the fixed point calcula¬ 
tion, like the arrays mapping the current state variable 
to the next (primed) state variable or the BDD cubes 
used for the existential and universal abstraction. 

The actual fix-point algorithm is implemented in two 
variants, differing only in the way they handle the forced 
predecessor function: one variant uses a monolithic tran¬ 
sition relation, while the other uses a partitioned transi¬ 
tion relation. Both variants use direct substitution. 

The variant with partitioned transition relation over¬ 
all performed better in preliminary experiments, so only 
this one was entered into the competition in the sequen¬ 
tial realizability track. Since on some examples the other 
variant performed better, the parallel version uses both 
variants running (independently) in parallel 

Implementation, availability. Realizer is written in 
Python and uses the BDD library GUDD in version 2.4.2 
with the corresponding Python bindings PyGUDD in 
version 2.0.2. It is currently not publicly available. 

6.5 Simple BDD Solver 

Simple BDD Solver was submitted by L. Ryzhyk from 
NIGTA, Sydney and the Garnegie Mellon University, 
Pittsburgh, and A. Walker from NIGTA, Sydney. Simple 
BDD Solver implements BDD-based realizability check¬ 
ing, and only competed in the sequential realizability 
subtrack. It does not support extraction of strategies. 

Synthesis algorithm(s). Simple BDD Solver is a sub¬ 
stantial simplification of the solver that was developed 
for the Termite projedj^ adapted to safety games given 

Analysis of results and subsequent inspection of the source 
code by the tool author showed that due to a bug the parallel 
version did not work as intended, and instead used two threads 
with identical strategy. As can be seen in the results section, this 
lead to a decreased performance overall. 

http://termite2.org Accessed February 2016. 


in the AIGER format. It uses the BDD package GUDD, 
with dynamic variable reordering using the sifting algo¬ 
rithm, and eager deallocation of BDDs. Furthermore, it 
uses a partitioned transition relation, direct substitution, 
and simultaneous conjunction and abstraction. 

Additionally, it uses an alternative form for the fix- 
point computation of UPRE that avoids creating the 
BAD latch and simplifies the quantification structure: 

The fixpoint formula ([^ in Section 2.3 is equivalent 
to 


/iS'(L). BX^yX^BL'. 

{S{L') A T{L, A„, Ae, L')) V BAD'. 

To avoid introducing the latch for BAD, we substitute 
bad' with the update function for BAD - an expression 
-^SAFE{L, Xu, Xc) over latches and inputs. This results 
in: 


PlS{L). BXuyX^BL'. 

{SiL') A T{L, Xu, A„ L')) V ^SAFE{L, A„, A,). 
Then, quantifiers are re-arranged to 


pS{L). BXuyX,. 

{BE. S{L') A T(L, A„, A„ L’)) V ^SAFE{L, A„, A,), 

with the safety condition outside of the innermost 
existential quantification. With this formula for the fix- 
point, simultaneous conjunction and abstraction can be 
used on the left-hand side of the disjunction, and we 
avoid to build the potentially large BDD of the conjunc¬ 
tion in the left-hand side at every iteration. 

Furthermore, the tool implements a variant of the 
fixpoint algorithm with an abstraction-refinement loop 
inspired by |26| . Since this variant has not been found to 
be competitive on the set of competition benchmarks, it 
has not been entered into the competition. 

Implementation, availability. Simple BDD solver is 
written in the Haskell functional programming language. 
It uses the GUDD package for BDD manipulation and 
the Attoparsecp^ Haskell package for fast parsing. Al¬ 
together, the solver, AIGER parser, compiler and com¬ 
mand line argument parser are just over 300 lines of 
code. The code is available online at: https://github. 
com/adcunwalker/syntcomp 


7 Execution 

SYNTGOMP 2014 used a compute cluster of identi¬ 
cal machines with octo-core Intel Xeon processors (2.0 
GHz) and 64 GB RAM, generously provided by Graz 

https://hackage.haskell.org/package/attoparsec Ac¬ 
cessed February 2016. 
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University of Technology. The machines are running a 
GNU/Linux system, and submitted solvers were com¬ 
piled using GCC version 4.7. Each node has a local 400 
GB hard drive that can be used as temporary storage. 

The competition was organized on the EDAGG plat¬ 
form developed for the SAT Gompetitions [43] . 
EDAGG directly supports the definition of subtracks 
with different benchmark sets, different solver configu¬ 
rations, verification of outputs, and automatic distribu¬ 
tion of jobs to compute nodes. During the competition, 
a complete node was reserved for each job, i.e., one syn¬ 
thesis tool (configuration) running one benchmark. This 
ensures a very high comparability and reproducibility of 
our results. Olivier Roussel’s runsolver was used to 
run each job and to measure GPU time and Wall time, 
as well as enforcing timeouts. As all nodes are identical 
and no other tasks were run in parallel, no other limits 
than a timeout per benchmark (GPU time for sequential 
subtracks, wall time for parallel subtracks) was set. The 
timeout for each task in any subtrack was 5000 seconds 
(GPU time or wall time, respectively). The queueing sys¬ 
tem in use is TORQUh]^ 

Some solvers did not conform completely with the 
output format specified by the competition, e.g. because 
extra information was displayed in addition to the speci¬ 
fied output. For these solvers, small wrapper scripts were 
used to execute them, filtering the outputs as to conform 
to the specified format. 

Validity of results. Beyer et al. recently noted that 
runsolver, along with a number of other benchmarking 
tools, has certain deficits that endanger the validity of re¬ 
sults. In particular, the GPU time of child processes may 
not be measured correctly. First, note that GPU time is 
only relevant for our results in the sequential subtracks, 
where tools are restricted to a single GPU core. Further¬ 
more, for the participants of SYNTGOMP we note that 
the only child processes (if any) are the reasoning en¬ 
gines for BDDs and SAT or QBE formulas. Since these 
reasoning engines take up almost all of the GPU time 
in solving synthesis tasks, a comparison of GPU time to 
the recorded wall time would most probably reveal mea¬ 
surements that exclude child processes. This was not the 
case for our results. 


8 Experimental Results and Analysis 


We present the results of SYNTGOMP 2014, sepa¬ 
rated into realizability and synthesis tracks, followed 
by some observations on the state of the art. 5 tools 
entered the competition and ran in 8 different config¬ 
urations in the 4 tracks of the competition]^ All of 


http://www.adaptivecomputing.com/products/ 
open-source/torque/ Acce ssed February 2016. 

As mentioned in Section ( 


6.1 


AbsSynthe was supposed to com¬ 
pete in different configurations, but due to a miscommunication 


the results can be viewed online in our EDAGG sys¬ 
tem at https;//syntcompdb.iaik.tugraz.at/2014/ 
experiments/. Furthermore, the full experimental data, 
including problem instances, executable code of the 
solvers, logfiles of executions, solutions produced by 
solvers, and executable code for verifying the solu¬ 
tions is available in directory ExperimentalData2014 of 
our public Git repository at https ;//bitbucket. org/ 
swenj acobs/syntcomp/ 


8.1 Realizability Track 

In the realizability track, tools were run on the full set of 
569 benchmarks. All 5 tools that entered SYNTGOMP 
competed with at least one configuration in the sequen¬ 
tial subtrack, and 4 of them also competed in the parallel 
subtrack. 

Sequential Subtrack. The sequential realizability 
track had 6 participants: AbsSynthe, Basil, Realizer 
and Simple BDD Solver competed with one configura¬ 
tion each, whereas Demiurge competed with two different 
configurations. Table [^contains the number of instances 
solved within the timeout per tool, the number of in¬ 
stances solved uniquely by a solver, and the accumulated 
points per tool according to our relative ranking scheme. 

No tool could solve all 569 benchmarks, and 13 
benchmarks were not solved by any of the tools within 
the timeout. 12 benchmarks were solved uniquely by one 
tool: 

— Basil: 4 versions of the factory assembly benchmarks 
(size 5x5 and 7x5, each with 10 and 11 errors) 

— Realizer: gb_s2_r2_l_UNREAl|^ 

— Demiurge (tempi): multlx with x G {2,3,4, 5,6}, 
staylSn and stay20n 

Table gives an overview of benchmark instances that 
were solved uniquely or not solved at all, for all subtracks 
of the competition. 

Furthermore, Figurej^gives an overview of solved in¬ 
stances by participant and benchmark classes (see Sec¬ 
tion]^, and Figure]^ a cactus plot for the time needed 
by each tool to solve the benchmarks. 

Analysis. Table and Figure show that the BDD- 
based tools AbsSynthe, Basil, Realizer and Simple BDD 
Solver are very close to each other when only compar¬ 
ing the number of instances solved. Furthermore, for the 
Amba and Genbuf benchmarks, some tools solve all in¬ 
stances in the benchmark set, i.e., we would need more 
difficult instances to distinguish which tool is better for 
these classes. 

was always started in the same configuration. The results pre¬ 
sented here for the relative ranking differ from those presented at 
CAV 2014 in that only one of the three identical configurations of 
AbsSynthe is counted in the sequential tracks. 

This benchmark was found to be realizable by the tool, al¬ 
though it was classified as unrealizable by the benchmark authors. 
Our analysis confirmed it to be realizable. 






Time (s) 
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Fig. 3. Sequential Realizability Track, Cactus Plot 


Table 2. Results of the Sequential Realizability Track 


Tool 

Solved 

Unique 

Relative 

Simple BDD Solver 

542 

0 

262 

Realizer 

539 

1 

229 

AbsSynthe 

536 

0 

144 

Basil 

520 

4 

209 

Demiurge (learn) 

359 

0 

209 

Demiurge (tempi) 

121 

7 

90 


The best result in each column is in bold. 


Regarding the SAT- and QBF-based synthesis ap¬ 
proaches, Demiurge (learn) solves about as many of the 


LTL2AIG benchmarks as the best BDD-based tools, and 
almost as many of the Toy Examples. For AMBA and 
Genbuf Benchmarks, Demiurge (learn) solves only about 
half as many benchmarks, and for the Moving Obstacle 
and Factory Assembly benchmarks can only solve one 
in each case. Finally, Demiurge (tempi) can solve a very 
good number of the Toy Examples, and even solves 7 of 
them uniquely. However, it solves only very few of the 
LTL2AIG benchmarks, and none of the others. 

As can be seen in Figure most tools have a steep 
degradation with higher complexity of benchmarks, i.e., 
between 90% and 95% of the benchmarks that can be 
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Table 3. Benchmark instances that were solved uniquely or not solved at all in at least one subtrack. “Yes” means solved by more than 
one tool, “Yes*” means uniquely solved. means that this benchmark instance was not tested. 


Benchmark 

Solved in 

Seq. Realizability 

Solved in 

Par. Realizability 

Solved in 

Seq. Synthesis 

Solved in 

Par. Synthesis 

ambaSc/y 

Yes 

Yes 

No 

No 

aiiiba9c5y 

Yes 

Yes 

Yes 

Yes* 

ambalOcSy 

Yes 

Yes 

Yes 

Yes* 

cntSOn 

No 

No 

- 

- 

cntSOy 

No 

No 

No 

No 

f actory_assembly_5x5_2_10errors 

Yes* 

Yes* 

No 

No 

factory_assembly_5x5_2_lterrors 

Yes* 

Yes* 

No 

No 

factory_assembly_7x5_2_10errors 

Yes* 

Yes* 

No 

No 

factory_assembly_7x5_2_1terrors 

Yes* 

Yes* 

No 

No 

gb_s2_r2_t_UNREAL 

Yes* 

No 

- 

- 

gb_s2_r2_2_REAL 

No 

No 

No 

No 

gb_s2_r2_3_REAL 

No 

No 

No 

No 

gb_s2_r2_4_REAL 

No 

No 

No 

No 

moving_obstacle_24x24_7glitches 

Yes 

Yes 

No 

No 

moving_obstacle_32x32_ttglitches 

Yes 

Yes 

No 

No 

moving_obstacle_48x48_t9glitches 

Yes 

Yes 

No 

No 

moving_obstacle_64x64_27glitches 

Yes 

Yes 

No 

No 

moving_obstacle_96x96_43glitches 

Yes 

Yes 

No 

No 

moving_obstacle_t28xt28_59glitches 

No 

No 

- 

- 

moving_obstacle_t28xt28_60glitches 

No 

No 

- 

- 

mult1t 

Yes 

Yes* 

- 

- 

mult12 

Yes* 

No 

No 

No 

mult13 

Yes* 

No 

- 

- 

mult14 

Yes* 

No 

- 

- 

mult15 

Yes* 

No 

- 

- 

mult16 

Yes* 

No 

No 

No 

stayl6y 

Yes 

Yes* 

Yes 

Yes* 

stayl8n 

Yes* 

No 

- 

- 

stayl8y 

No 

No 

- 

- 

stay20n 

Yes* 

No 

- 

- 

stay20y 

No 

No 

- 

- 

stay22n 

No 

No 

- 

- 

stay22y 

No 

No 

- 

- 

stay24n 

No 

No 

- 

- 

stay24y 

No 

No 

- 

- 


solved within the timeout of 5000 seconds can actually 
be solved very quickly, i.e., in less than 600 seconds. 

The uniquely solved benchmarks also show that there 
are significant differences between the algorithms of dif¬ 
ferent tools. In particular, the template-based variant 
of Demiurge, while not very successful overall, can de¬ 
termine realizability for a relatively large number of Toy 
Examples that cannot be solved by the other approaches. 

Regarding the relative ranking, in Table we note 
that Demiurge (learn) has the same score as Basil (and 
higher than AbsSynthe), even though it can only solve 
359 problems, compared to 520 for Basil and 536 for 
AbsSynthe. This is because this ranking rewards Demi¬ 
urge for being one of the fastest tools on many of the 
small problem instances. 

Parallel Subtrack. The parallel realizability subtrack 
had 4 participants: parallel versions of Demiurge and Re- 


alizeip^ and sequential versions of AbsSynthe and Ba¬ 
sil. The results are summarized in Table |4 Again, no 
tool could solve all 569 benchmarks. Table [3] shows 21 
benchmarks that were not solved by any of the tools 
within the timeout, and 6 benchmarks that were solved 
uniquely by one tool. The successful tools are: 

— Basil: 4 versions of the factory assembly benchmarks 
(the same as before) 

— AbsSynthe: multll and stayl6y 

We note that Realizer could not solve the problems that 
it uniquely solved in sequential execution mode. Further¬ 
more, Demiurge (tempi) did not compete in the parallel 
subtrack, therefore its uniquely solved problems from the 
sequential subtrack are unsolved here. 


Due to a bug, the parallel version of Realizer performed worse 
than the sequential version, as mentioned in Section [hi 
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Table 4. Results of the Parallel Realizability Track Table 5. Results of the Sequential Synthesis Track 


Tool 

Solved 

Unique 

Relative 

Tool 

Solved 

Relative 

Quality 

MCTO 

Realizer 

538 

0 

279 

AbsSynthe 

143 

329 

265 

6 

AbsSynthe 

536 

2 

219 

Demiurge (learn) 

121 

379 

240 

0 

Basil 

520 

4 

331 

Basil 

117 

218 

219 

5 

Demiurge (parallel) 

324 

0 

133 

Demiurge (tempi) 

31 

77 

57 

0 


The best result in each column is in bold. The best result in each column is in bold. 


A cactus plot for the number of benchmarks that 
can be solved by each tool within the timeout is given 
in Figure We do not give a detailed analysis of the 
number of solved instances by category, since it is very 
similar to the analysis in Figure 

Analysis. Figure shows the same steep degradation of 
runtime with increasing complexity as in the sequential 
case. Concerning the effectiveness of parallel versus se¬ 
quential implementations, this subtrack shows that cur¬ 
rently none of the implementations benefits from using 
parallelism. To the contrary, the parallel implementa¬ 
tions of Demiurge (learn) and Realizer were not able to 
solve (in 5000s Wall time) the problems that their se¬ 
quential implementations solved uniquely in the sequen¬ 
tial subtrack (in 5000s CPU time), and AbsSynthe and 
Basil are sequential implementations. 

Regarding the relative ranking based on Wall time, 
we note another weakness of the chosen ranking system: 
this ranking heavily favors implementations in CH—h that 
have a quick startup time. Basil solves 150 problems 
in less than 0.36s Wall time, which is the minimal time 
needed to solve any single problem for the Python-based 
implementations AbsSynthe and Realizer. That is, the 
relative ranking scheme and our benchmark selection fa¬ 
vors tools that can solve easy benchmarks very quickly, 
and in particular the tools implemented in C+-I-, as they 
make very efficient use of Wall time. 

Summing up, we see that in the realizability tracks 
the BDD-based tools in general outperform the SAT- 
and QBF-based approaches, except for a small subset of 
the benchmarks. Between the BDD-based approaches, 
the differences we could detect are rather small — the 
percentage of benchmarks that can be solved by the 
BDD-based implementations ranges only from 91% to 
95%. None of the tools benefits from parallelism. 


8.2 Synthesis Track 


In the synthesis track, tools were evaluated with respect 


to the relative and quality rankings (see Section 5.2), 


based on the size of solutions. Since these rankings are 
only defined on realizable specifications, we excluded 
all unrealizable specifications. Furthermore, we excluded 
most of the problems that could not be solved by any 
tool in the realizability track, since synthesis in general 
is harder than realizability checking. Out of the remain¬ 
ing 382 benchmarks, we chose 157 benchmarks with the 


goal to ensure a good coverage of different benchmark 
classes, and a good distribution over benchmarks of dif¬ 
ferent difficulty. Only 3 out of the 5 tools that entered 
SYNTCOMP competed in the synthesis track: AbsSyn¬ 
the, Basil and Demiurge. 

Sequential Subtrack. The sequential synthesis sub¬ 
track had 4 participants: AbsSynthe and Basil com¬ 
peted with one configuration each, and Demiurge with 
two different configurations. Table shows the number 
of solved instances, and the accumulated points per tool 
in the relative and quality rankings. Note that a prob¬ 
lem only counts as solved if the solution is successfully 
model checked. The number of model checking timeouts 
(MCTO) is also given in the table. 

No tool could solve all 157 benchmarks. No bench¬ 
mark was solved uniquely by one tool, and 14 bench¬ 
marks were solved by none of the tools (see Table . 
AbsSynthe solved the highest number of problems and 
earns the highest score in the quality ranking. Demiurge 
(learn) earns the highest score in our relative ranking, 
even though it solves less problems than AbsSynthe. 

Both AbsSynthe and Basil produced a small num¬ 
ber of solutions that could not be model checked within 
the separate 3600s timeout. While counting these addi¬ 
tional solutions would have changed the scores of these 
tools, it would not have changed the order of tools in 
any of the rankings. Figures and give an overview 
of the size of produced implementations for a subset of 
the benchmarks, showing significant differences on im¬ 
plementation sizes for some instances, in particular from 
the AMBA and Genbuf classes. 

Analysis. Regarding the relative and quality rankings, 
we note that Demiurge (learn) profits from taking solu¬ 
tion sizes into account. Figure shows that for those 
instances of the AMBA and GenBuf benchmarks that 
Demiurge (learn) can solve, it provides implementations 
that are often by an order of magnitude smaller than 
those of the other tools. Figure shows a number of 
benchmarks where the implementation sizes are equal or 
very similar. Most of the time, the solutions of Demiurge 
(learn) are smaller than those of AbsSynthe and Basil, 
which is why it scores higher than AbsSynthe and much 
higher than Basil in the relative ranking, even though it 
solves less problems than AbsSynthe, and about as many 
as Basil. In the quality ranking, the relative difference 
between Demiurge (learn) and AbsSynthe is significantly 
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• AbsSynthe • Basil • Demiurge (parallel) • Realizer (parallel) 



# Benchmarks solved 

Fig. 4. Parallel Realizability Track, Cactus Plot 


■ AbsSynthe ■ Basil ■ Demiurge (learn) ■ Demiurge (parallel) 


SIZE (# AND GATES) 

1 10 100 1000 10000 100000 


amba2c7y.aag 

ambaScSy.aag 

amba4c7y.aag 

ambaScSy.aag 

ambaOcSy.aag 

amba7c5y.aag 

amba8c7y.aag 

amba9c5y.aag 

ambalOcSy.aag 

genbuf2c3y.aag 

genbuf4c3y.aag 

genbuf6c3y.aag 

genbuf8c3y.aag 

genbuf9c3y.aag 

genbufl2c3y.aag 

genbufl6c3y.aag 



1000000 


10000000 


Fig. 5. Comparison of implementation sizes for a subset of the AMBA and GenBuf benchmarks 
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■ AbsSynthe ■ Basil ■ Demiurge (learn) ■ Demiurge (parallel) ■ Demiurge (tempi) 

SIZE (# AND GATES) 

1 10 100 1000 10000 100000 

addZy.aag 

add4y.aag 

addBy.aag 

addSy.aag 

addlOy.aag 

addlZy.aag 

addl4y.aag 

addlBy.aag 

addlSy.aag 

addZOy.aag 

bsSy.aag 

bslBy.aag 

bs32y.aag 

bs64y.aag 

bsl28y.aag 

cnt2y.aag 

g cnt4y.aag 
< 

2 cntSy.aag 
u cntlSy.aag 
m mult2.aag 
multS.aag 
mv2y.aag 
mv4y.aag 
mvSy.aag 
mvl2y.aag 
mvlSy.aag 
mvs2y.aag 
mvs4y.aag 
mvsSy.aag 
mvslSy.aag 
mvs24y.aag 
stay2y.aag 
stay4y.aag 
staySy.aag 
stayl2y.aag 
staylSy.aag 



Fig. 6. Comparison of implementation sizes for a subset of the toy example benchmarks 


smaller than in the number of solved instances (9.5% 
versus 15.5% difference). 


Furthermore, we note that the benchmark set con¬ 
tains relatively many problems that are easy to solve. 
For example, AbsSynthe can solve 75 of the 157 prob¬ 
lems in less than 0.5s CPU time. 


Comparing the BDD-based tools, AbsSynthe solves 
a number of problems that Basil cannot solve, and pro¬ 
vides smaller solutions in many cases. 

Parallel Subtrack. The parallel synthesis subtrack had 
3 participants: one configuration each of AbsSynthe, Ba¬ 
sil, and Demiurge. Demiurge (parallel) was the only tool 
to use parallelism in the synthesis track. The results are 
summarized in Table|6l No tool could solve all 157 bench- 
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Table 6. Results of the Parallel Synthesis Track 


Tool 

Solved 

Relative 

Quality 

MCTO 

AbsSynthe 

143 

352 

266 

6 

Demiurge (parallel) 

119 

393 

237 

0 

Basil 

117 

235 

196 

5 


The best result in each column is in bold. 


the automatic reordering operations. Therefore, it can 
be expected that the performance of BDD-based imple¬ 
mentations heavily depends on the performance of the 
used BDD package. Since all of the tools in SYNTCOMP 
2014 use the same BDD package, the results of the com¬ 
petition do not shed light on this issue, however. 


marks, 3 benchmarks were solved uniquely by one tool, 
and 14 benchmarks were solved by none of the tools. 

The benchmarks solved uniquely by one tool are: 

— AbsSynthe: amba9c5y, aunbalOcSy, and stayl6y. 

Like in the sequential subtrack, both AbsSynthe and 
Basil produced a small number of solutions that could 
not be model checked. Implementation sizes for Demiurge 
(parallel) are included in Figures and showing that 
in some cases the implementations are even smaller than 
those obtained from Demiurge (learn), in particular for 
the AMBA benchmarks. 

Analysis. Like in the sequential synthesis subtrack. Demi¬ 
urge profits from providing small solutions, even though 
it solves less problems than its competitors. Further¬ 
more, we note that Demiurge in this case profits from 
parallelism to some extent. While it solved 2 problems 
less than the sequential Demiurge (learn), the solutions 
provided by Demiurge (parallel) were in some cases even 
smaller than those provided by the sequential version. 

8.3 Observations on the State of the Art 


BDD-based Synthesis. The standard BDD-based fix- 
point algorithm for solving safety games is currently the 
most efficient way for realizability checking based on 
monitor circuits. Implementations of the algorithm build 
on existing BDD packages, including operations for com¬ 
position, abstraction, and dynamic reordering of BDDs. 
Based on these complex BDD operations, a competitive 
implementation can be fairly simple, as can be seen for 
example in Simple BDD Solver, which only consists of 
about 300 lines of code. A few optimizations seem to be 
crucial, like automatic reordering, partitioned transition 
relations, and direct substitution. For other optimiza¬ 
tions, like eager deallocation of BDDs or simultaneous 
conjunction and abstraction, we have mixed results: the 
tool authors that implemented them report increased ef¬ 
ficiency, but we also have competitive tools that do not 
implement them. 

A drawback of BDD-based synthesis becomes appar¬ 
ent when comparing the size of solutions to those of 
Demiurge (learn): in many cases, the produced imple¬ 
mentations are much larger than necessary. 

As can be expected, a deeper analysis of the run¬ 
time behavior of BDD-based tools shows that most of 
the time is spent manipulating BDDs, in particular in 


Template-based Synthesis. The template-based al¬ 
gorithm implemented in Demiurge (tempi) only solves a 
small subset of the benchmark set — a closer analysis 
shows that it only performs well if a simple CNF rep¬ 
resentation of the winning region exists, which applies 
only to few SYNTCOMP benchmarks. Hence, its per¬ 
formance on average is rather poor. However, this ap¬ 
proach solves large instances of the mult, cnt and stay 
benchmarks much faster than the competition, or solves 
them uniquely. 


Learning-based Synthesis. The learning-based algo¬ 
rithm implemented in Demiurge (learn) solves far more 
benchmarks than the template-based algorithm: 62% of 
the benchmarks instead of 21% in the sequential realiz¬ 
ability track. Still, the approach cannot really compete 
with the BDD-based tools, which solve more than 90%. 
In the parallel realizability track, the situation is similar. 

In the synthesis tracks, which are restricted to re¬ 
alizable problems and have rankings that take into ac¬ 
count the size of solutions. Demiurge (learn) performed 
much better. Here, it solves 77% of the benchmarks, 
compared to 78% for Basil and 95% for AbsSynthe (be¬ 
fore model checking). Additionally, the learning-based 
algorithm produces circuits that are sometimes several 
orders of magnitude smaller than those produced by 
the BDD-based tools. This is also highlighted by the 
fact that all solutions of Demiurge (learn) are success¬ 
fully model checked, while both AbsSynthe and Basil 
produce a number of solutions that can not be verified 
within the timeout. 


Parallel Subtracks. The submitted tools in general do 
not use parallelization very efficiently. The parallel ver¬ 
sion of Realizer performs worse than the sequential ver¬ 
sion due to a bug. For the parallel version of Demiurge, 
the result is double-edged: on the one hand, the parallel 
version solves 2 problems less than the sequential ver¬ 
sion, on the other hand the solutions provided are often 
even smaller than the ones produced by the sequential 
version. 


For BDD-based tools, the lack of efficient parallel 
implementations correlates with the lack of efficient par¬ 
allelized operations in BDD packages. While there have 
been recent efforts to parallelize BDD operations Mp5 


this package does not support the important automatic 
reordering of BDDs, which makes it hard to integrate 
into a technique that heavily relies on reordering. 
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9 Conclusions and Future Plans 


SYNTCOMP 2014 was a big success, making the first 
step towards establishing the competition as a regu¬ 
lar event and its benchmark format as a standard lan¬ 
guage in the synthesis community. A number of synthesis 
tools have been developed specifically for the competi¬ 
tion (AbsSynthe, Basil, Realizer), while others are new 
versions or modifications of existing tools (Demiurge. 
Simple BDD Solver). Recently, the competition format 
has also been adopted by tool developers that have thus 
far not participated in SYNTCOMP 22 . Furthermore, 
the competition has sparked a lively discussion on the 
implementation of efficient synthesis techniques, in par¬ 
ticular making tool developers aware of the range of opti¬ 
mizations used in BDD-based synthesis algorithms, and 
alternative SAT- and QBF-based approaches that are 
competitive at least on some classes of benchmarks. 

At the time of this writing, SYNTCOMP 2015 has al¬ 
ready been held . For the second iteration of the com¬ 
petition, we have expanded the benchmark set to more 
challenging benchmarks, and to a wider range of differ¬ 
ent benchmark classes. Additionally, following ideas of 
Sutcliffe and Suttner we have developed a classifica¬ 
tion scheme for benchmarks in terms of difficulty, based 
on the results of SYNTCOMP 2014. Using this classifi¬ 
cation, in SYNTCOMP 2015 we selected benchmarks to 
balance the weight of benchmark instances from different 
classes and different difficulties. 

Finally, recall that SYNTCOMP 2014 (and 2015) was 
restricted to the synthesis of finite-state systems from 
pure safety specifications in AIGER format. On the one 
hand, this resulted in a low entry-barrier for the com¬ 
petition and revived interest in the synthesis from pure 
safety specifications, as witnessed by several new tools 
and research papers related to the competition mu 


On the other hand, many of the existing synthesis tools 
did not participate because their strengths are in differ¬ 
ent kinds of synthesis tasks, for example in the synthesis 
from specifications in richer specification languages such 
as GR(1) or LTL. Thus, many interesting synthesis ap¬ 
proaches are currently not covered by the competition. 
For SYNTCOMP 2016, we plan to extend the competi¬ 
tion to a specification format that includes both GR(1) 
and LTL specifications 1^. 
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